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ESTIMATION  OF  MULTINOMIAL  PROBABILTIES 


Khursheed  Alam* 
Clemson  University 


ABSTRACT 


This  paper  deals  with  the  estimation  of  the  parameters 
(cell  probabilities)  of  a  multinomial  distribution.  The  maximum 
likelihood  estimator  ( MLE )  is  known  to  be  minimax  and  admissible 
with  respect  to  a  quadratic  loss  function.  It  is  shewn  that  the 
MLE  is  inadmissible  with  respect  to  a  non-quadratic  loss  function. 
For  the  parameters  of  m  multinomial  distributions  being  esti¬ 
mated  simultaneously  and  the  loss  being  quadratic,  an  estimator 
is  given  which  is  shown  to  have  smaller  risk  than  the  MLE  for 
all  but  a  small  subset  of  the  parameter  space,  when  m  is  large. 

Key  words:  Multinomial  Distribution;  Maximum  Likelihood; 
Admissible  Minimax  Estimators. 
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(1) 


1.  Introduction  and  main  results.  Let  x  =  (x^,...,x^) 
be  distributed  according  to  a  multinomial  distribution  M(x,p,n) 
with  k  cells ,  where  p  =  (p^ , . . . , p^) ' ,  0  £  p^  <_  1  (i  =  1 , .  .  .  ,k)  , 
/i_l  p^  =  1  and  x.  =  n.  For  estimating  p  we  consider 

below  two  loss  functions,  given  by 

(1.1)  L(5,p)  =  n^i=l  {'5i"Pi)2 

(1.2)  L*(5,p)  =  n^=1  (3i-pi)2/pi 

where  6-  =  o  •  (x)  and  6  =  (6  ■>,•••,  6.)  denotes  an  estimator  of  p. 

1  1  •>,  ~  X  K  ^ 

Let  5°  =  x/n  denote  the  maximum  likelihood  estimator  (MLE)  whose 
risk  is  given  by 

(1.3)  R ( 6 ° , p)  =  E  L(5°,p) 


(1.4)  R  (  5  0  ,  p )  =  E  L* ( 5  0 / p) 

=  k  -  1. 

First  we  consider  the  loss  (1.2).  The  covariance  matrix 
of  x  is  given  by  )  =  where  =  -np^p^ ,  i  f  j  and 

;ii  =  np ( 1~P ^ )  .  Clearly,  ,  is  a  singular  matrix.  A  genera¬ 
lized  inverse  of  2.  is  given  by  a  diagonal  matrix 
whose  ith  diagonal  element  is  equal  to  (np. )  ^  ,  as  it  can 
be  verified  that  ;  J  L  =  L-  Hence,  the  loss  function  (1.2) 


y 


(2) 


represents  the  Mahalanobis  distance  function  n(5-p)'^  (:- p) .  It 
is  also  seen  that  Pearson's  chi-square  test  statistic  used  for 
testing  goodness  of  fit,  represents  the  loss  due  to  the  MLE . 

Olkin  and  Sobel  (1978)  have  shown  that  the  maximum  likelihood 
estimator  5 0  is  admissible  for  estimating  p  with  respect  to  the 
loss  function  (1.2),  among  all  estimators  c  for  which 


(1.5) 


rk 

i=l 


1. 


Since  the  risk  of  o°  is  constant,  as  given  by  (1.4) ,  the  MLE  is 
also  minimax  among  those  estimators.  If  the  condition  (1.5)  is 
removed  then  o°  is  inadmissible.  This  is  shown,  as  follows,  by 
finding  an  estimator  5*  which  dominates  5°. 

The  Dirichlet  distribution  is  a  conjugate  prior  distribution 
for  the  parameter  of  a  multinomial  distribution .  Suppose  that  p 
is  distributed  a1  priori  according  to  the  Dirichlet  distribution 
Dtp,,),  given  by  the  density  function 


(1.6) 


f  (p,  ;) 


:  (k  j) _ 

(:(■-•)  )K 


(p. 


o  <  ■  1. 


A  Bayes  estimator  of  p  with  respect  to  (1.6)  and  the  loss  func¬ 
tion  (1.2)  is  *,  given  by 


(1.7) 


,+x . -1 

l 

n+k  -1  ' 


i  — 


x.  =  0 

l 


0 


(3) 


By  direct  computation  we  obtain  the  risk  of  given  by 

_P  o  v  l-(l-p.)n 

(1.8)  R*(o*,p)  =  n  (n+kv-1)  [ (v-1)  - — - —  +  2nk(v-l) 

+  n  (n+k-1)  ]  -  2n  (n+kv-1)  -1  [  (v-1)  Ii=1  <  1"  ( 1-Pj. )  n)  +n] 
+  n. 


For  v  =  1,  (1.8)  reduces  to 


R* (5*,p) 


n(k-l) 

n+k-1 


<  RM3°,p)  . 


For  v  <  1  we  have  from  (1.8) 

(1.9)  R* ( 5 * , p )  <  n2 (n+kv-1) "2 (k(v-l) 2  +  2k(v-l)  +  (n+k-1)] 

-  2vn2 (n+kv-1)  ^  +  n 

=  n2  (n+kv-1)  2 (kv2+n-l)  -  2 vn2 (n+kv-1)  ^  +  n. 

The  quantity  on  the  right  hand  side  of  the  equality  in  (1.9)  is 
equal  to  for  v  =  1.  Therefore,  R*(;*,p)  <  R*  (c°,p)  for 

1  -  -  ^  v  _  1 ,  where  n  is  a  positive  number  depending  on  the  values 
of  n  and  k.  Thus  5*  dominates  5°  for  certain  values  of  v.  Note 
that  '  *  does  not  satisfy  the  condition  (1.5)  .  Note  also  that  :* 
is  admissible,  being  a  Bayes  estimator. 


i 


(4) 


Next,  we  consider  the  quadratic  loss  given  by  (1.1) .  Johnson 
(1971)  and  Alam  (1978)  have  shown  that  the  maximum  likelihood 
estimator  is  admissible  with  respect  to  the  quadratic  loss. 

Steinhaus  (1957)  and  Trybula  (1958)  have  obtained  minimax  estima¬ 
tors  for  the  more  general  loss  function  of  the  form  ( 6 ) 2 , 

rk. 

where  c-  are  constants  and  > .  ,  5 .  =  1.  It  should  be  observed  that 
i  ^1=1  i 

and  estimator  6  which  does  not  satisfy  the  condition  (1.5)  is 
inadmissible  with  respect  to  the  loss  function  given  by  (1.1) , 
since  the  projection  of  5  on  the  hyperplane  )^_^  x^  =  1  gives  an 
estimator  satisfying  (1.5)  for  which  the  loss  is  smaller. 

We  consider  below  the  problem  of  estimating  simultaneously 
the  parameters  of  m  >  2  multinomial  populations.  Let 
denote  the  m  populations,  and  let  (p ,p . ,)  denote  the  vector 

1  1  lX 

—  k 

of  cell  probabilities  associated  with  -t  .  ,  where  )  .  ,  p.  .  =  1.  A 

i  ‘■]=i  ri] 

sample  of  n  observations  is  taken  from  each  population.  Let  x^j 
denote  the  sample  frequency  associated  with  the  jth  cell  of  ~ 

The  loss  is  given  by  n£™_^  Ij  =  i  ^ij'^ij^'  ec3ua^  to  n  times  the 
sum  of  squared  errors,  where  6- .  denotes  an  estimate  of  p. ., 
depending  on  the  entire  set  of  observations,  even  though  the  set 
of  observations  from  ~r ^  alone  seems  to  be  relevant.  A  sort  of 
empirical  Bayes  estimator  for  the  given  problem  is  obtained,  as 
follows . 


Without  confusing  with  the  notation  used  above,  we  shall 
denote  below  the  MLE  and  a  Bayes  estimator  for  the  problem  of 
simultaneous  estimation  by  5°  and  5*,  respectively,  and  let 

?  -  <Pu . Let  y  *  IT.!  Ij.i  4i'  *  1  -  Ij.i  Pi, 
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(5) 


«rt\ 

and  q  =  £ . q. .  The  MLE  is  given  by  6°  ■  =  x. ,/n  and  its  risk  is 
x  i.  x  1313 

given  by 

(1.10)  R (<S  °  , p)  =  q. 

A  Bayes  estimator  with  respect  to  the  Dirichlet  prior  (1.6)  is 
given  by 

(1.11)  5 *  .  =  (xij+v)/(n+k v  ) 
and  by  direct  computation  its  risk  is  given  by 

(1.12)  R(5*,p)  =  n(n+kc)  2[(n-v2k2)q  +  v2mk(k-l)]. 

A  value  of  v  minimizing  (1.12)  is  given  by 

(1.13)  0  =  q (m (k-1) -kq)  ^ 
for  which 


R(5*,p)  =  -22..  <  R(5°,p)  . 

n+kv 

Since  ;  is  unknown,  the  above  inequality  suggests  that  we  might  use 

,  for  v.  But  v  is  also  unknown,  since  q  is  unknown.  But  an  esti- 

2 

mate  of  q  is  given  by  (mn  -y)/n(n-l)  since  its  expected  value  is 
equal  to  q.  Substituting  the  estimate  for  q  in  (1.13)  we  get 
after  simplification 

o 

mn^-v 

ky-mn(n+k-l) 


(1.14) 


(6) 


Since  the  value  of  v  given  by  (1.14)  is  negative  for  certain 
values  of  y,  we  make  a'ltiinor  modification  and  finally  come  up 
with  a  value  of  v  =  A,  say,  given  by 


(1.15) 


2 

mn  -y 
ky-mn 


2 


The  empirical  Bayes  estimator  is  obtained  from  o*  by  substi¬ 
tuting  A  for  v  in  (1.11).  We  shall  denote  it  by  6**. 

It  is  shown  below  in  Section  2  that  R(5**,p)  <  R(5°,p) 

for  all  values  of  p  for  which 

(1.16)  g  >  2n4/(n-l) 2m' ,  m>n2/(1'5) 

where  0  <  3  <  i  .  That  is,  6**  has  smaller  risk  than  5°  for 
sufficiently  large  values  of  m,  except  for  a  set  of  values  of 
p  approaching  the  null  set  as  m  -*•  ».  Johnson  (1971)  has  shown 
that  there  is  no  "Stein  effect",  that  is,  there  is  no  estima¬ 
tor  which  dominates  the  MLE  for  a  given  value  of  m.  This  is 
essentially  for  the  reason,  as  Johnson  points  out,  that  the 
risk  of  the  MLE  is  small  near  the  boundary  of  the  parameter 
space,  given  by  q  =  0 .  A  numerical  comparison  of  the  risk  of 
-**  and  5°  is  given  in  Section  3. 

The  above  results  are  summarized  in  the  following 


theorems . 


(7) 


Theorem  1.1.  The  MLE  is  admissible  with  respect  to  (1.2) 
among  all  estimators  satisfying  the  condition  (1.5)  but:  inadmis¬ 
sible  among  all  estimators  and  is  dominated  by  5* ,  given  by 
(1.7)  . 


Theorem  1.2.  R(d**,p)  <  R(5°,p)  for  all  values  of  p  for 

which  (1.16)  holds,  where  5**  is  given  by  (1.11)  with  the  value 
of  -j  given  by  (1.15). 

2.  Proof  of  Theorem  1.2.  First  we  give  a  preliminary  result 
which  will  be  used  in  the  sequel.  Let 

(2.1)  z  =  (UkA)'1 

,  2 
kv-mn 

(k-l)mn2 


We  have 


E(z)  =  1  - 


(2^>  (JV)  §L 

n  X  - 1  m 


Mote  that  2.  <  - 
m  —  k 


As 


r-U  *!j> 


,-x  z 
(-5=l  Xij 


-k  2  ~> 

(E  -  x  •) 
-3=1  13 


z  ,  r  x  z  .  _  -  x  -  . 

_<n  x.  ^)  -  u  X  ■  ,  ) 


7  * 


„  -X 


n "  ( n ~  -  E  3 _ ,  x ,  _ 


(n-1) qi 


7 


(8) 


we  have 


(2.2) 


Var ( z ) 


(*z±)  ( Ji_)2  -3. 
1  n  ;  lk-l;  2 
m 


The  risk  of  3**  is  given  after  simplification  by 


(2.3) 


n‘lR(4**,p>  .  E  Z”=1  1^,  tiJ5-PA3> 


=  E  [y+2mnA  +mkA 2 )  (n+k,A )  2  -  2-ra  .  7^  .  o.  .  x. 

J  “i=l  Li  =  l  *•  n  i 


j  ID 


(n+kX)  1  -  2m  X  ( n+k ,\ )  1  +  m  -  q] 


<_  E  (y+2mnX  +mk.\  2  )  (n+kA  )  2  -  2(i^_^  Ij  =  i  ?ij£xij^ 


E(n+kX)  ^  -  2m  E  A  (n+kX)  ^  +  m  -  q 


2-n  _l 

=  E [ (y+2mnA +mk\  )  (n+kX  )  -  2n(m-q)  (n+k:-) 


-2mA (n+kA)  ^  +  m  -  q] 


E[y-mk\2)  (n+kA)  2  -  2n(m-q)  (n+k'-)  ^  +  m  - 


_  r  /  ^  \  2 

E[ntn(— r)  (-T— )  ( 


n-r  '  k  1  +  (n-1 )  2  (  i+  ( n-i )  z )  2 


,  ,  ,k-l, ,  ,  2nz 

+  (q-m(— ))  (jTTn^ITt  '  15 


If 


(9) 


3|.Q 


(10) 


,  n  ,  ,k-l.  r2n- 1  , ,  ,n-1.2,  k  .  q.  l,n-l,  .  k  .  q, 

nta)(— (1  -  (~ >  -  n  ~ n-"  k-T  m 


F  (x)  dx 


j0  ( 1+  (n-1)  x) 


3-(-  +  -iy) 

m  n  n-1 


Let  0  <  e  <  -  . 

—  n 


We  have 


(2.5; 


F (x)dx 


(E(z)-c  fE(z)  fl 


j0  (1+ (n-1) x) 2 


J 


0 


+  j 

JS(Z)  —  ; 


+  I 


F (x) dx 


( 1+  (n-1)  x) 


'E(z) 


The  first  integral  on  the  right  hand  side  of  (2.5)  is  majorized 
by 

(2.5)  F  ( E  ( z)  -z)  ^  —  by  Chebychev's  inequality 


(^)(k^T)2  ~~2~2  **  (2.2) 

m  ; 


The  second  integral  is  majorized  by 


(2.7) 


(l+(n-l)  (E(z)  -s)-2  =  -%d-  ~  l^'=) 

n 


The  third  integral  is  majorized  by 
,-l 


(2.3) 


_o  l-p f  z) 

( 1  *  (n-1) x)  dx  =  - 


^  E  ( z) 


1+ (n-1) E ( z) 


23_  (Szij/JL-w],  -  (nii)2(JS_)3.r1 

mn  1  n  ’  [  n  1  X-l'm 


(11) 


Using 

(2.9) 


since  e.  <_ 

Let  Q 
right  side 

(2.10) 

where  h  = 

Q 


y 


(2.5)  through  (2.8)  in  (2.4)  we  get 


(ran)  1  (R  (  5**  /P)  -  R(c°,p))  £  n  (j^)  (^^)  [  (^y1) 


~i  i n_l  w  ^  \  <1  \  /  9  |  li  \  ^  j.  Ji_  /-]_!•  n~A.i  2 

2(— }  (  2_2C  n  }  (k-l}  n2  U  1  n  1 

m  c  n 


(JL_\3.  -  -n~ji  e)-2)  -  ( -~.-)  2  (  — )  2  (-3_)  2 

k-1  m  n  '  n  1  ^k-lJ  Kmn‘ 


-  n(H^r)  (ir)  [i^r{(_7T)  (k-i}  m2;2  +  -> 


-  ( n-  --)  2  ( —  -)  2  ( -3_)  2  ] 

1  n  1  lk-l;  [mn’  J 


3. 

m 


< 


1. 


denote  the  quantity  inside  the  square  bracket  on  the 
of  the  second  inequality  in  (2.9)  .  Suppose  that 


a 

m 


> 


2n 


(n-1) 


and  0 


Putting 


5-1 

2 

m  in  Q  we  get 


2n- 1 
-  n-1 


n-1,  ,  k  ,2,? (n-1)  2n-l,,  q 

(  3  it-t-  i*  ■ 

n  m 


i-1 

2  ,  2n- 1  ,n-lw  k  ,2, "(n-1)  2n-l,  , 

m  (i^r  i(— )(k^i)  (~ ^3 - T^rn 


0  for  n 


2  . 


(12) 


Therefore 


R(o**,p)  -  R(S°,p)  <  0 

for  all  values  of  p,  satisfying  the  inequality  (1.16). 

3.  Numerical  comparison  of  the  risk  of  5**  and  c°.  By  Theorem 
2.2  if  m  is  large  then  R(5*,p)  <  R(5°,p)  for  all  but  a  small 
subset  of  the  values  of  p  near  the  boundary  of  the  parameter 
space,  given  by  q  =  0 .  In  many  practical  situations  requiring 
simultaneous  estimation  of  the  parameters  of  several  multinomial 
populations  the  value  of  q  is  &  priori  bounded  away  from  zero  so 
that  the  inequality  holds  for  moderately  large  values  of  m. 
Therefore,  5**  should  be  ordinarily  preferred  to  5°.  Let 


o(o°,d*)  = 


R(5°,p)  -  R(d*,p) 
R  (  5  0  ,  p ) 


denote  the  relative  saving  in  the  risk  of  5**.  The  following 
table  gives  for  illustration  8  sets  of  values  of  c(:°,5*)  ,  com¬ 
puted  by  Monte  Carlo  method  for  m  =  10,  n  =  10,  20  and  k  =  2,3,4 
with  the  values  of  p  being  chosen  randomly.  It  is  seen  from  the 
table  that  there  is  considerable  saving  in  the  risk  due  to  5**. 


(14) 
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